Distributed Association Rule Mining with Minimum Communication Overhead

نویسندگان

  • Md. Golam Kaosar
  • Zhuojia Xu
  • Xun Yi
چکیده

In distributed association rule mining algorithm, one of the major and challenging hindrances is to reduce the communication overhead. Data sites are required to exchange lot of information in the data mining process which may generates massive communication overhead. In this paper we propose an association rule mining algorithm which minimizes the communication overhead among the participating data sites. Instead of transmitting all itemsets and their counts, we propose to transmit a binary vector and count of only frequently large itemsets. Message Passing Interface (MPI) technique is exploited to avoid broadcasting among data sites. Performance study shows that the proposed algorithm performs better than two other well known algorithms known as Fast Distributed Algorithm for Mining Association Rules (FDM) and Count Distribution (CD) in terms of communication overhead.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization of Distributed Association Rule Mining Based Partial Vertical Partitioning

Association rule mining is a one of the most important technique in data mining. Data mining is the process of analyzing data from different angles & getting useful information about data. Modern organizations are geographically distributed. Using the traditional centralized association rule mining to discover useful patterns in such distributed system is not always feasible because merging dat...

متن کامل

Implementation of Efficient Algorithm for Mining High Utility Itemsets in Distributed and Dynamic Database

Association Rule Mining (ARM) is finding out the frequent itemsets or patterns among the existing items from the given database. High Utility Pattern Mining has become the recent research with respect to data mining. The proposed work is High Utility Pattern for distributed and dynamic database. The traditional method of mining frequent itemset mining embrace that the data is astride and sedent...

متن کامل

A Survey on Efficient Incremental Algorithm for Mining High Utility Itemsets in Distributed and Dynamic Database

Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information. It can be defined as the activity that extracts information contained in very large database. That information can be used to increase the revenue or cut costs. Association Rule Mining (ARM) is finding out the frequent itemsets or patterns among the existing items from the given ...

متن کامل

Mining Frequent Itemsets in Distributed and Dynamic Databases

Traditional methods for frequent itemset mining typically assume that data is centralized and static. Such methods impose excessive communication overhead when data is distributed, and they waste computational resources when data is dynamic. In this paper we present what we believe to be the first unified approach that overcomes these assumptions. Our approach makes use of parallel and incremen...

متن کامل

Privacy preserving association rules mining on distributed homogenous databases

Privacy is one of the most important properties that an information system must satisfy. In these systems, there is a need to share information among different, not trusted entities, and the protection of sensible information has a relevant role. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy preserving when data mining techniques a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009